Harrisonburg
Learning to Reason Over Time: Timeline Self-Reflection for Improved Temporal Reasoning in Language Models
Bazaga, Adrián, Blloshmi, Rexhina, Byrne, Bill, de Gispert, Adrià
Large Language Models (LLMs) have emerged as powerful tools for generating coherent text, understanding context, and performing reasoning tasks. However, they struggle with temporal reasoning, which requires processing time-related information such as event sequencing, durations, and inter-temporal relationships. These capabilities are critical for applications including question answering, scheduling, and historical analysis. In this paper, we introduce TISER, a novel framework that enhances the temporal reasoning abilities of LLMs through a multi-stage process that combines timeline construction with iterative self-reflection. Our approach leverages test-time scaling to extend the length of reasoning traces, enabling models to capture complex temporal dependencies more effectively. This strategy not only boosts reasoning accuracy but also improves the traceability of the inference process. Experimental results demonstrate state-of-the-art performance across multiple benchmarks, including out-of-distribution test sets, and reveal that TISER enables smaller open-source models to surpass larger closed-weight models on challenging temporal reasoning tasks.
- North America > United States > Kansas (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > Connecticut > Hartford County > Bristol (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
Beyond Reweighting: On the Predictive Role of Covariate Shift in Effect Generalization
Jin, Ying, Egami, Naoki, Rothenhäusler, Dominik
Many existing approaches to generalizing statistical inference amidst distribution shift operate under the covariate shift assumption, which posits that the conditional distribution of unobserved variables given observable ones is invariant across populations. However, recent empirical investigations have demonstrated that adjusting for shift in observed variables (covariate shift) is often insufficient for generalization. In other words, covariate shift does not typically ``explain away'' the distribution shift between settings. As such, addressing the unknown yet non-negligible shift in the unobserved variables given observed ones (conditional shift) is crucial for generalizable inference. In this paper, we present a series of empirical evidence from two large-scale multi-site replication studies to support a new role of covariate shift in ``predicting'' the strength of the unknown conditional shift. Analyzing 680 studies across 65 sites, we find that even though the conditional shift is non-negligible, its strength can often be bounded by that of the observable covariate shift. However, this pattern only emerges when the two sources of shifts are quantified by our proposed standardized, ``pivotal'' measures. We then interpret this phenomenon by connecting it to similar patterns that can be theoretically derived from a random distribution shift model. Finally, we demonstrate that exploiting the predictive role of covariate shift leads to reliable and efficient uncertainty quantification for target estimates in generalization tasks with partially observed data. Overall, our empirical and theoretical analyses suggest a new way to approach the problem of distributional shift, generalizability, and external validity.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- (32 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Government (0.92)
- Health & Medicine (0.67)
Evaluation of OpenAI o1: Opportunities and Challenges of AGI
Zhong, Tianyang, Liu, Zhengliang, Pan, Yi, Zhang, Yutong, Zhou, Yifan, Liang, Shizhe, Wu, Zihao, Lyu, Yanjun, Shu, Peng, Yu, Xiaowei, Cao, Chao, Jiang, Hanqi, Chen, Hanxu, Li, Yiwei, Chen, Junhao, Hu, Huawen, Liu, Yihen, Zhao, Huaqin, Xu, Shaochen, Dai, Haixing, Zhao, Lin, Zhang, Ruidong, Zhao, Wei, Yang, Zhenyuan, Chen, Jingyuan, Wang, Peilong, Ruan, Wei, Wang, Hui, Zhao, Huan, Zhang, Jing, Ren, Yiming, Qin, Shihuan, Chen, Tong, Li, Jiaxi, Zidan, Arif Hassan, Jahin, Afrar, Chen, Minheng, Xia, Sichen, Holmes, Jason, Zhuang, Yan, Wang, Jiaqi, Xu, Bochen, Xia, Weiran, Yu, Jichao, Tang, Kaibo, Yang, Yaxuan, Sun, Bolun, Yang, Tao, Lu, Guoyu, Wang, Xianqiao, Chai, Lilong, Li, He, Lu, Jin, Sun, Lichao, Zhang, Xin, Ge, Bao, Hu, Xintao, Zhang, Lian, Zhou, Hua, Zhang, Lu, Zhang, Shu, Liu, Ninghao, Jiang, Bei, Kong, Linglong, Xiang, Zhen, Ren, Yudan, Liu, Jun, Jiang, Xi, Bao, Yu, Zhang, Wei, Li, Xiang, Li, Gang, Liu, Wei, Shen, Dinggang, Sikora, Andrea, Zhai, Xiaoming, Zhu, Dajiang, Liu, Tianming
This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance in areas ranging from coding challenges to scientific reasoning and from language processing to creative problem-solving. Key findings include: -83.3% success rate in solving complex competitive programming problems, surpassing many human experts. -Superior ability in generating coherent and accurate radiology reports, outperforming other evaluated models. -100% accuracy in high school-level mathematical reasoning tasks, providing detailed step-by-step solutions. -Advanced natural language inference capabilities across general and specialized domains like medicine. -Impressive performance in chip design tasks, outperforming specialized models in areas such as EDA script generation and bug analysis. -Remarkable proficiency in anthropology and geology, demonstrating deep understanding and reasoning in these specialized fields. -Strong capabilities in quantitative investing. O1 has comprehensive financial knowledge and statistical modeling skills. -Effective performance in social media analysis, including sentiment analysis and emotion recognition. The model excelled particularly in tasks requiring intricate reasoning and knowledge integration across various fields. While some limitations were observed, including occasional errors on simpler problems and challenges with certain highly specialized concepts, the overall results indicate significant progress towards artificial general intelligence.
- North America > United States > California > Los Angeles County > Los Angeles (0.27)
- North America > United States > Georgia > Clarke County > Athens (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
- (31 more...)
- Research Report > Promising Solution (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- (2 more...)
- Leisure & Entertainment (1.00)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- (12 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)
Autonomous Hiking Trail Navigation via Semantic Segmentation and Geometric Analysis
Reed, Camndon, Tatsch, Christopher, Gross, Jason N., Gu, Yu
Natural environments pose significant challenges for autonomous robot navigation, particularly due to their unstructured and ever-changing nature. Hiking trails, with their dynamic conditions influenced by weather, vegetation, and human traffic, represent one such challenge. This work introduces a novel approach to autonomous hiking trail navigation that balances trail adherence with the flexibility to adapt to off-trail routes when necessary. The solution is a Traversability Analysis module that integrates semantic data from camera images with geometric information from LiDAR to create a comprehensive understanding of the surrounding terrain. A planner uses this traversability map to navigate safely, adhering to trails while allowing off-trail movement when necessary to avoid on-trail hazards or for safe off-trail shortcuts. The method is evaluated through simulation to determine the balance between semantic and geometric information in traversability estimation. These simulations tested various weights to assess their impact on navigation performance across different trail scenarios. Weights were then validated through field tests at the West Virginia University Core Arboretum, demonstrating the method's effectiveness in a real-world environment.
- North America > United States > West Virginia (0.25)
- South America > Brazil (0.04)
- North America > United States > Virginia > Harrisonburg (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Deep Neural Network Identification of Limnonectes Species and New Class Detection Using Image Data
Xu, Li, Hong, Yili, Smith, Eric P., McLeod, David S., Deng, Xinwei, Freeman, Laura J.
As is true of many complex tasks, the work of discovering, describing, and understanding the diversity of life on Earth (viz., biological systematics and taxonomy) requires many tools. Some of this work can be accomplished as it has been done in the past, but some aspects present us with challenges which traditional knowledge and tools cannot adequately resolve. One such challenge is presented by species complexes in which the morphological similarities among the group members make it difficult to reliably identify known species and detect new ones. We address this challenge by developing new tools using the principles of machine learning to resolve two specific questions related to species complexes. The first question is formulated as a classification problem in statistics and machine learning and the second question is an out-of-distribution (OOD) detection problem. We apply these tools to a species complex comprising Southeast Asian stream frogs (Limnonectes kuhlii complex) and employ a morphological character (hind limb skin texture) traditionally treated qualitatively in a quantitative and objective manner. We demonstrate that deep neural networks can successfully automate the classification of an image into a known species group for which it has been trained. We further demonstrate that the algorithm can successfully classify an image into a new class if the image does not belong to the existing classes. Additionally, we use the larger MNIST dataset to test the performance of our OOD detection algorithm. We finish our paper with some concluding remarks regarding the application of these methods to species complexes and our efforts to document true biodiversity. This paper has online supplementary materials.
- Asia > Thailand (0.04)
- Asia > Vietnam (0.04)
- Asia > Southeast Asia (0.04)
- (12 more...)
Modeling Supply and Demand in Public Transportation Systems
Bihler, Miranda, Nelson, Hala, Okey, Erin, Rivas, Noe Reyes, Webb, John, White, Anna
We propose two neural network based and data-driven supply and demand models to analyze the efficiency, identify service gaps, and determine the significant predictors of demand, in the bus system for the Department of Public Transportation (HDPT) in Harrisonburg City, Virginia, which is the home to James Madison University (JMU). The supply and demand models, one temporal and one spatial, take many variables into account, including the demographic data surrounding the bus stops, the metrics that the HDPT reports to the federal government, and the drastic change in population between when JMU is on or off session. These direct and data-driven models to quantify supply and demand and identify service gaps can generalize to other cities' bus systems. Keywords-- transportation systems, bus systems, public transportation, direct ridership models, data driven models, mathematical modeling, neural networks, machine learning, supply models, demand models, machine learning, service gaps, social vulnerability, public transportation access, GIS data, data science, data quality.
- North America > Canada > Ontario > Hamilton (0.14)
- North America > United States > Virginia > Harrisonburg (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- (8 more...)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Virginia 'shopping cart killer' case flags dating app dangers: They're a 'toy store' for murderers
Crime Stoppers of Houston Andy Kahan and FOP national vice president Joe Gamaldi react to the nation's growing crime crisis on'Justice w/ Judge Jeanine.' A potential fifth victim has been identified in the "shopping cart killer" case, involving an alleged serial killer in Northern Virginia, that has crime experts warning of the dangers of online dating. Officers believe suspect Anthony Robinson made contact with the victims via dating websites which Crime Stoppers of Houston's Andy Kahan described on "Justice w/ Judge Jeanine" as "toy stores" for murderers. "The dark side of online dating apps are luring in millions of women to, perhaps… mortal danger," he said. "There are no background checks; we all know sex offenders troll it. You're essentially playing Russian roulette with your life when you divulge personal information and continue to go out and meet people that you do not know."
- North America > United States > Virginia > Harrisonburg (0.06)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.06)
Improving mathematical questioning in teacher training
Datta, Debajyoti, Phillips, Maria, Bywater, James P, Chiu, Jennifer, Watson, Ginger S., Barnes, Laura E., Brown, Donald E
High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies. However, dialogue-oriented open-ended conversations such as teaching a student about scale factors can be difficult to model. This paper builds a text-based interactive conversational agent to help teachers practice mathematical questioning skills based on the well-known Instructional Quality Assessment. We take a human-centered approach to designing our system, relying on advances in deep learning, uncertainty quantification, and natural language processing while acknowledging the limitations of conversational agents for specific pedagogical needs. Using experts' input directly during the simulation, we demonstrate how conversation success rate and high user satisfaction can be achieved.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.16)
- North America > United States > Virginia > Harrisonburg (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (0.51)
- Questionnaire & Opinion Survey (0.48)
Introduction To Deep Learning Coursera Github Hse
Courses The major educational initiative of the JHUDSL is to create open-source online courses delivered through a range of platforms including Youtube, Github, Leanpub, and Coursera. Welcome to the "Introduction to Deep Learning" course! In the first week you'll learn about linear models and stochatic optimization methods. Please note that this is an advanced course and we assume basic knowledge of machine learning. I am currently working as a data science researcher and trainee at Jheronimus Academy of Data Science.
- North America > United States > Illinois (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (7 more...)
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting > Online (1.00)